A new (T-X\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^\theta$$\end{document}θ) family of distributions: properties, discretization and estimation with applications

In this paper, a new class of distributions called the T-X\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^\theta$$\end{document}θ family of distributions for bounded—(0,1)—and unbounded—\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(0,\infty )$$\end{document}(0,∞)—supported random variables is suggested. Some special sub-models of the proposed family are utilized. A new sub-model is selected to be studied in details. The statistical properties of the suggested family including quantile function, moments, moment generating function, order statistics and Rényi entropy are discussed. The maximum likelihood method is provided to estimate the parameters of the distribution and a Monte Carlo simulation study is used. The discretized T-X\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^\theta$$\end{document}θ family provided many sub-families and sub-models. In addition, eight real data sets are utilized to demonstrate the flexibility of the proposed continuous and discrete family’s multiple sub models.

Data and their distributions are the consequences of generating models.One can imagine that real generating methods are unlimited and therefore we are met with quite a few of models.Statisticians try to choose a mathematical model to fit a given data set.The more data we have, the larger our repertoire of models should be.The existence of a huge variety of data in many problems, requires numerous changes in the classical distributions producing new families of distributions.The newly found distributions may have new properties beside inheriting some advantages of the classical ones which often are special cases of the new generated distribution.This adds additional flexibility to the produced distribution, which it is hoped that it will be of benefit and provides accurate prediction.As a result, comparisons with the classical distributions were frequently in favor of the newly produced ones.Several approaches have been suggested on how to find new flexible distributions.Many alternatives have been employed; sometimes they realized there purpose through adding one or more parameters to a baseline distribution in different ways, some used transformation forms while others merged more than one distribution to obtain a new one.Reference 1 proposed the transformed transformer's family (T-X family).This method is derived from three functions (R, F, and W), where R and F are the cdfs of two random variables, T and X, respectively, and the W(.) function is used to connect T's support to the range of X.The cdf and the pdf of the T-X family are given by and where r(

Special models
This section introduces various new T-X θ family models based on different X random variables.The Exponential-X θ Exponential distribution is studied in details and Some of its features are provided.In Table 3, several new models derived from the T-X θ family of distributions are provided.The Uniform-X θ Beta distribution in Table 3 is the McDonald distribution defined by Ref. 8 with θ + 1 = c (1) Table 1.Some members of the T-X family.

W(F(x))
Rang of T The family Author Weighted T-X family

Properties of the E-X θ E distribution
In this subsection the E-X θ E distribution is studied in details.Some properties of the E-X θ E distribution including quantile function, moments, moment generating function, order statistics and entropy are provided.

The quantile function
The quantile function (qf) of the E-X θ E distribution can be obtained by equating the cdf (G(x)) in (3) to u, 0 < u < 1, and solve it for x, then Table 2. Some cdf 's of the new family based on different T distributions.

Distribution of t cdf of t G(x) Range
Beta I t (α, β) There is no closed form for the quantile function and it can be obtained numerically.Butting u = 0.5, in (5), the median (M) of the E-X θ E distribution can be calculated.

pdf expansion
Here we need to expand the pdf which will be useful in calculating some properties of the E-X θ E distribution such as moments and moment generating function.To obtain the pdf expansion one needs to use the following expansion then the pdf in (4) can be expressed as

The moments
If X is a random variable distributed as E-X θ E(α, θ) , then the nth ordinary moment of E-X θ E distribution using the pdf in ( 6) will be The central moments can be obtained from the moments by Therefore, the mean and the variance of the E-x θ E distribution are given by and ( 5) The numerical values of mean ( µ ) and variance ( σ 2 ) of the E-X θ E distribution are listed in Table 4 for selected values of α and θ.
From Table 4 one can notes that; the mean and the variance of the E-X θ E distribution decreases as θ or α increases.For fixed value of α or θ the mean and variance decreases as the other parameter increases.

The skewness and kurtosis based on moments
The measure of skewness (Sk) describes the degree of symmetry of the distribution while the kurtosis (Ku) is the peakedness of the distribution.They associated with the E-X θ E distribution using the moments by and Table 5 shows numerical values for the skewness (Sk), and kurtosis (Ku) of the E-X θ E distribution for some values of α and θ.
From the results provided in Table 5, it's observed that the E-X θ E distribution covered different shapes of pdf.
For fixed value of α , the kurtosis of the E-X θ E distribution is high peak (leptokurtic) for values of 3 < θ < 5 , but the distribution is neither too high peak nor too flat topped for 3 < θ < 10 (mesokurtic).

The moment generating function
The moment generating function of the E-X θ E distribution can be obtained using the expansion in (6), as follows Table 4. Mean and variance of the E-X θ E distribution for some values of α and θ. www.nature.com/scientificreports/Thus we can find the nth moment by differentiating the M X (t) n times, and then setting t = 0 in the result; that is,

Order statistics
Let X 1 , . . ., X k be k independent random variables from distribution with pdf g(x) and cdf G(x).According to Ref. 9 , the pdf of rth order statistics, is given by Using the binomial expansion for te quantity Inserting ( 8), ( 3) and ( 4) in (7), then the pdf of the rth order statistic from the T-X θ family of distributions is given by Let X 1 , . . ., X k be a random sample from the E-X θ E distribution.The pdf of rth order statistic, X r:k , for the E-X θ E distribution is defined by Proof The pdf of rth order statistic, X r:k , for the E-X θ E distribution can be written as where v = l + k − r + 1 .Using the Exponential expansion for e −vx θ +vx θ e −αx , then the Form in ( 9) is obtained.

Entropy
In information theory entropy can be regarded as a measure of a system's degree of uncertainty.It has a widely applications in economics, physics, weather science and sociology.in this section the Rényi entropy measurements is evaluated for the E-X θ E distribution.

Rényi entropy
10 defined the Rényi entropy which considered as a generalization of the Hartley, Shannon, collision and min entropy.The Rényi entropy of a random variable X is defined by www.nature.com/scientificreports/ The Rényi entropy of the T-X θ family of distributions is given by This form is easy to be shown by applying the following binomial expansion The Rényi entropy of of a random variable X following the E-X θ E distribution is Proof Applying result in (10), then the Rényi entropy of the E-X θ E can be written as Assuming the exchange between summation and integration is possible, then the last form will be The result in Eq. ( 11) obtained from the last form by using the gamma function as

Parameter estimation and simulation study
In the first subsection, the maximum likelihood estimation (MLEs) of the parameters of the E-X θ E distribution is discussed.In the second subsection, a simulation study is obtained.

Maximum likelihood (MLE)
Let X 1 , X 2 , ..., X n be a random sample from E-X θ E distribution.The log-likelihood function corresponding to (4) is The partial derivatives of α and θ corresponding to (12), are given by and Therefore, the MLE for α and θ are achieved by setting ( 13) and ( 14) to zero and then numerically solving them using a simulation technique such as Newton Rahbson.

Interval estimation
the second derivatives of the log likelihood function for α and θ are and The hessian matrix H can be obtained as follows by using Eqs.( 13), ( 16) and (17) .

Simulation study
In this subsection a Monte Carlo simulation study is presented to demonstrate the effectiveness of the ML approach for estimating the E-X θ E distribution parameters ( α and θ ).The following are the steps of the simula- tion procedure: 1. Set the parameter values for ( α and θ ) as (2,0.1),(0.1,2), (0.5,1.5), (1.2,1.5),(1,0.2),(0.3,0.2), (0.6,0.5), (0.1,0.2), (1,2), and (0.2,1).( 14) www.nature.com/scientificreports/ 2. Using the E-X θ E distribution's quantile function, which is defined in (5), to generate a random sample of size n, where n = 20, 45, 60, 90, 120, 150, and 200. 3. Using the generated data obtained in step 2, the MLE of the parameters α and θ is calculated.4. The biases and the root mean squared errors (RMSE) were determined using the provided formulas, The biases, root mean squared errors and variances of α and θ are reported in Table 6.In general, as predicted, the results in this table showed a decrease in the values of the biases RMSE and variance as sample size increases, indicating that the MLE is a reliable approach for estimating the E-X θ E parameters as it is unbiased, the variance is minimum and it realizes the consistence.

Discrete T-X θ family
Statistical literature contains many techniques that can be used to discretize the continuous family of distributions.One of these techniques is the one that depends on the survival function.Following 11 , the survival function for a discrete life time distribution is defined as S(x) = P(X ≥ x), x = 1, 2, ... and S(0) = 1 , then the probability mass function (pmf) is: The new family is generated by discretizing the continuous cdf function in (1) using the form in (18).The pmf of the T-X θ family is given by Based on this pmf, with different T distributions, Table 7 contains some new discrete sub-families of the discrete T-X θ family.
From Table 7 for the DW-X θ family, we notes that; • When the shape parameter = 1, the DW-X θ family reduces to the discrete Exponential family of distribu- tions (DE-X θ ) with parameters 1/β and θ. • When = 2, the DW-X θ family reduces to the discrete Rayleigh family of distributions (DR-X θ ) with param- eters β 2 and 2θ.• The DW-X θ family can be considered as The DE-X θ family with exponentiation F(x), ie θ = θ ⋆ , β = β ⋆ and is the exponentiation parameter.

The discrete exponetial-X θ , DE-X θ family
The cumulative distribution function (cdf), survival function and probability mass function (pmf) of the DE-X θ family of distributions are The survival and the hazard rate functions of the DE-X θ family of distributions are With different X random variables, many distributions can be generated as members of the DE-X θ family as shown in Table 8.

The DE-Exponential θ (DEE) distribution
The DEE distribution is derived here as an example from Table 8.Substituting F(x) in ( 20) and ( 21) by the Exponential distribution with parameter β , then The pmf of the DEE distribution with three parameters are given by; while the survival and the hazard functions are given by bias

Applications to real data
The applications in this section are derived into two subsections, the first subsection contains applications of the continuous distributions displayed in Table 3. While, the second subsection deals with count data that matched the discrete distributions presented in Table 8.

Using continuous data
In this subsection multiple models of the T-X θ family are fitted to four different data sets.These examples are provided to demonstrate the flexibility of the new family members when compared against a variety of distributions.The estimation of each model parameters is obtained using the maximum likelihood method.To compare www.nature.com/scientificreports/ the distributions, three criteria are calculated, including: the Akaike information criterion (AIC), the Bayesian information criterion (BIC) and the corrected Akaike information criterion (AICc).
where L(θ; x) denotes the log likelihood for the model, k is the number of parameters and n is the sample size.In general the model that best fits the data is the one with the highest log L and p values and the lowest AIC, AICc, BIC, and Ks values.The Mathematica package was used to assess all of the required computations and figures.Table 9 shows the distributions that have been fitted to the data for comparison considerations.

Data set I
The data set provides the wait times (in min) before service for 100 Bank clients, which were evaluated and assessed by 16 for fitting the Lindley distribution.The data are: 0.8, 0.8, 1.3, 1.5, 1.8, 1.9, 1.9, 2.1, 2.6, 2.7, 2.9, 3.1,3.The estimated parameter values and the goodness of fit measures for this data are presented in Table 10.According to the results in this table, the G-X θ L, E-X θ E, G-X θ E, E-X θ G, and L-G{F} distributions are fitted to this data.The G-X θ L distribution is the best option among other competitive models, as shown by the findings in Table 10, as it has the greatest p-value, and the smallest other goodness of fit statistics.These findings are also supported by Fig. 4 which represents the empirical cdf and the observed density (histogram) for data set I together with the competitive models.

Data set II
The data set comes from Ref. 17 and represents the time between failures for repairable components.The data are provided as shown below: Table 7.Some pmfs of the new family based on different T distributions.

Exponential
Table 8.Some pmfs of the DE-X θ family based on different X distributions.The parameter estimates and the goodness-of-fit statistics for E-X θ E, G-X θ L,G-X θ E, Bu-X θ E, CIR and PIL distributions are listed in Table 11.All competitive distributions are fitted and perform well when examining these data with a p-value greater than 0.05.However, the optimal model to acquire the best assessment of the data is the E-X θ E model, which has the smallest values of -ll, AIC, BIC, AICc and k-s statistics, as well as the highest p-value of all the examined models.Figure 5 supports the results in Table 11.

Data set III
The data set was acquired from Ref. 18  Figure 6 depicts the empirical cdf and observed density (histogram) for Data set III, compared with the cdf 's and pdf 's of E-X θ E, E-X θ G, G-X θ E, Bu-X θ E and CIR distributions.Table 12 displays the calculated parameters  as well as the goodness-of-fit values.As seen in Table 12 and Fig. 6, the G-Xθ E distribution was chosen as the best model for this data because it had the lowest goodness-of-fit statistics and the greatest p-value, which = 1, of all the competitive distributions.

Data set IV
The data shows the remission times (in months) of 128 bladder cancer patients.Reference 15    The summary statistics for this data is shown in Table 16. Figure 9 depicts the fitted cdf plots of the proposed models compared with the DEETE, INH, DEOWE, DBiEXII, PMiD and Geometric distributions for the data set VI.The estimated parameter values and the goodness of fit measures are presented in Table 16.According to the results in this table, the DEE, DEW, DEB, DEL, DEFr, DEG and DEOWE distributions are fitted to the data.The DEE distribution is the best option among other competitive models as it has the greatest p-value, while the DEG has the smallest other goodness of fit statistics, as shown by the findings in Table 16.

Data set VII
This data derived from a study performed in the lab on male mice who were given a 300 roentgen radiation exposure and were 5-6 weeks old.This information describes additional causes of death than the two primary causes: Thymic lymphoma and reticulum cell sarcoma.This data were examined by Ref. 26 17, shows the estimated values of parameters as well as the goodness of fit statistics.
This table shows that, with the exception of the INH and Geometric distributions, all distributions are fitted and performed well when examining these data with p-values greater than 0.05.However, The PMiD distribution is the best option among other competitive models as it has the greatest p-value, as shown by the findings in Table 17.

Data set VIII
These data are the yields from 70 consecutive runs of a batch chemical process (see 25 ).The data are 23 23  The empirical and estimated cdf for the this data is shown in Fig. 11.The estimated parameters and goodnessof-fit measurements are also included in Table 18.
This Table shows that, the DEE, DEW, DEB, DEL, DEFr, DEG, DEETE, and DEOWE distributions are Fitted and perform quite well for evaluating this data with p-values larger than 0.05.The DEOWE distribution is selected as the best model for these data because it has the smallest -ll, AIC, BIC, and χ2 values and the highest p-value.

Summary and conclusion
In this research, a new way for generating distributions is developed with high degree of flexibility that would be very useful in modeling real data in various fields.The new family of distributions is called the T-X θ family suggested with extra shape parameters θ for bounded and positive unbounded random variables.Several specific sub-families and sub-models of the proposed family are presented including the Exponential-X θ Exponential distribution which was selected and studied in details.The parameters of this distribution were estimated using the MLE method.A simulation study was also conducted to investigate the efficiency as well as behavior of estimates.The discretized T-X θ family of distributions has been proposed and some discrete models of the family were defined.As an example, the discrete Exponential Exponential, DEE, distribution, a three-parameter discrete distribution derived, various alternative graphs of the DEE's pmf and hazard functions were shown.Eight different actual data sets were used to demonstrate the effectiveness of some members of the suggested continuous and discrete members of the family vs some other distributions.In general, the results indicate that the proposed distributions are highly flexible, provide accurate results and can fit various types of data.The new models achieved a closer match for all data sets.
t) and R(t) are the pdf and cdf of the random variable T ∈ [a, b] , where −∞ ≤ a < b ≤ ∞.The W(.) is monotonically and non decreasing function with rang of W = [a, b]. 1. W(F(x)) ∈ [a, b].

7 X
θ F(x) (0, 1) T-X θ family Proposed (0, ∞) www.nature.com/scientificreports/The Exponential-X θ Exponential Let the random variable T follows the Exponential distribution with parameter = 1 , and F(x) is the cdf of the Exponential distribution with parameter α , then the cdf and the pdf of the Exponential-X θ Exponential(E-X θ E) distribution are defined as and where the survival and the hazard functions are and For different values of α and θ the pdf and the hazard function of the E-X θ E distribution are plotted in Fig. 1.

Figure 2
Figure 2 displays some possible pmf shapes of the DEE distribution.The hazard rate function may has an increasing or decreasing shape as shown in Fig. 3.

Figure 2 .Figure 3 .
Figure 2. The pmf of the DEE distribution for different parameter values.

Figure 4 .
Figure 4. (a) The empirical and the estimated cdf for data set I. (b) The histogram and estimated pdf for data set I.

Figure 8 .
Figure 8.The empirical cdf 's of some fitted distributions for data set V.

Figure 10
Figure10shows the fitted cdf plots of the suggested models compared with the DEETE, INH, DWMOE, DBiEXII, PMiD and Geometric distributions to this data set.Table17, shows the estimated values of parameters as well as the goodness of fit statistics.This table shows that, with the exception of the INH and Geometric distributions, all distributions are fitted and performed well when examining these data with p-values greater than 0.05.However, The PMiD distribution is the best option among other competitive models as it has the greatest p-value, as shown by the findings in Table17.

Table 5 .
Skewness and kurtosis of E-X θ E distribution for some values of α , and θ.

Table 6 .
Summary of MLE's results simulation of the E-X θ E for some values of α and θ.

Table 11 .
Goodness-of fit statistics for data set II.

Table 17 .
Parameter estimates and goodness-of fit statistics for data set VII.